output: html_document: fig_width: 6 fig_height: 4
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
Most wine’s quality is 6 and range is 3 to 8. The mean of alcohol is 10.42.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
The volatile.acidity distribution is normal. The median is 7.9.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
The volatile.acidity distribution is bimodal with the volatile.acidity peaking at 0.4, 0.5 and 0.6.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
## feature
## 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
## 132 33 50 30 29 20 24 22 33 30 35 15 27 18 21
## 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
## 19 9 16 22 21 25 33 27 25 51 27 38 20 19 21
## 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44
## 30 30 32 25 24 13 20 19 14 28 29 16 29 15 23
## 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
## 22 19 18 23 68 20 13 17 14 13 12 8 9 9 8
## 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74
## 9 2 1 10 9 7 14 2 11 4 2 1 1 3 4
## 0.75 0.76 0.78 0.79 1
## 1 3 1 1 1
The distribution for citric acid appears bimodal with the peaking at 0, 0.24, 0.49.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
## feature
## 0.9 1.2 1.3 1.4 1.5 1.6 1.65 1.7 1.75 1.8 1.9 2 2.05 2.1 2.15
## 2 8 5 35 30 58 2 76 2 129 117 156 2 128 2
## 2.2 2.25 2.3 2.35 2.4 2.5 2.55 2.6 2.65 2.7 2.8 2.85 2.9 2.95 3
## 131 1 109 1 86 84 1 79 1 39 49 1 24 1 25
## 3.1 3.2 3.3 3.4 3.45 3.5 3.6 3.65 3.7 3.75 3.8 3.9 4 4.1 4.2
## 7 15 11 15 1 2 8 1 4 1 8 6 11 6 5
## 4.25 4.3 4.4 4.5 4.6 4.65 4.7 4.8 5 5.1 5.15 5.2 5.4 5.5 5.6
## 1 8 4 4 6 2 1 3 1 5 1 3 1 8 6
## 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.55 6.6 6.7 7 7.2 7.3 7.5
## 1 4 3 4 4 3 2 3 2 2 2 1 1 1 1
## 7.8 7.9 8.1 8.3 8.6 8.8 8.9 9 10.7 11 12.9 13.4 13.8 13.9 15.4
## 2 3 2 3 1 2 1 1 1 2 1 1 2 1 2
## 15.5
## 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
## feature
## 0.9 1.2 1.3 1.4 1.5 1.6 1.65 1.7 1.75 1.8 1.9 2 2.05 2.1 2.15
## 2 8 5 35 30 58 2 76 2 129 117 156 2 128 2
## 2.2 2.25 2.3 2.35 2.4 2.5 2.55 2.6 2.65 2.7 2.8 2.85 2.9 2.95 3
## 131 1 109 1 86 84 1 79 1 39 49 1 24 1 25
## 3.1 3.2 3.3 3.4 3.45 3.5 3.6 3.65 3.7 3.75 3.8 3.9 4 4.1 4.2
## 7 15 11 15 1 2 8 1 4 1 8 6 11 6 5
## 4.25 4.3 4.4 4.5 4.6 4.65 4.7 4.8 5 5.1 5.15 5.2 5.4 5.5 5.6
## 1 8 4 4 6 2 1 3 1 5 1 3 1 8 6
## 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.55 6.6 6.7 7 7.2 7.3 7.5
## 1 4 3 4 4 3 2 3 2 2 2 1 1 1 1
## 7.8 7.9 8.1 8.3 8.6 8.8 8.9 9 10.7 11 12.9 13.4 13.8 13.9 15.4
## 2 3 2 3 1 2 1 1 1 2 1 1 2 1 2
## 15.5
## 1
## 90%
## 3.6
Transform the long tail data to better understand the distribution of residual.sugar The distribution for residual.sugar appears to be right skewed. Most of them (90%) residual.sugar less than 3.6 (4.5 g / cm^3 are considered sweet).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
## 95%
## 0.1261
Transform the long tail data to better understand the distribution of chlorides The distribution for chlorides appears to be right skewed. Most of them (95%) chlorides less than 0.1261 .
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## 95%
## 35
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## 95%
## 35
Transform the long tail data to better understand the distribution of free.sulfur.dioxide The free.sulfur.dioxide distribution is bimodal with the free.sulfur.dioxide peaking at 7 and 17.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## 95%
## 112.1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## 95%
## 112.1
Transform the long tail data to better understand the distribution of total.sulfur.dioxide The total.sulfur.dioxide distribution is normal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
## feature
## 2.74 2.86 2.87 2.88 2.89 2.9 2.92 2.93 2.94 2.95 2.98 2.99 3 3.01 3.02
## 1 1 1 2 4 1 4 3 4 1 5 2 6 5 8
## 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.1 3.11 3.12 3.13 3.14 3.15 3.16 3.17
## 6 10 8 10 11 11 11 19 9 20 13 21 34 36 27
## 3.18 3.19 3.2 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.3 3.31 3.32
## 30 25 39 36 39 32 29 26 53 35 42 46 57 39 45
## 3.33 3.34 3.35 3.36 3.37 3.38 3.39 3.4 3.41 3.42 3.43 3.44 3.45 3.46 3.47
## 37 43 39 56 37 48 48 37 34 33 17 29 20 22 21
## 3.48 3.49 3.5 3.51 3.52 3.53 3.54 3.55 3.56 3.57 3.58 3.59 3.6 3.61 3.62
## 19 10 14 15 18 17 16 8 11 10 10 8 7 8 4
## 3.63 3.66 3.67 3.68 3.69 3.7 3.71 3.72 3.74 3.75 3.78 3.85 3.9 4.01
## 3 4 3 5 4 1 4 3 1 1 2 1 2 2
The pH distribution is normal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
The distribution for density acid appears to be normal and the different between min and max is only 0.014. ( different between alcohol and water is 0.22)
Ref : https://en.wikipedia.org/wiki/Ethanol
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
## 95%
## 0.93
Transform the long tail data to better understand the distribution of sulphates. The distribution for sulphates appears to be normal. Most of them (95%) sulphates less than 0.93.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The distribution for alcohol appears to be right skewed.
## feature
## 3 4 5 6 7 8
## 10 53 681 638 199 18
## [1] 0.9493433
Most of data’s wine qulity is between 5 to 7 (94.9 %). I think I will covert this feature to factor for Multivariate Analysis.
ANS : There are 1599 wine in the data set with 12 features.
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
Input variables (based on physicochemical tests): 1. - fixed acidity (tartaric acid - g / dm^3): most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2. - volatile acidity (acetic acid - g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. - citric acid (g / dm^3): found in small quantities, citric acid can add ‘freshness’ and flavor to wines 4. - residual sugar (g / dm^3): the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5. - chlorides (sodium chloride - g / dm^3): the amount of salt in the wine 6. - free sulfur dioxide (mg / dm^3): the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7. - total sulfur dioxide (mg / dm^3): amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. - density (g / cm^3) 9. - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10. - sulphates (potassium sulphate - g / dm3): a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant 11. - alcohol (% by volume): the percent alcohol content of the wine
Output variable (based on sensory data): 12. - quality (score between 0 and 10)
ANS: The main feature of interest is wine’s quality. I would like to investigate which variable(s) effect the wine quality.
investigation into your feature(s) of interest? ANS: I think smell taste touch and addictive content that will effect the wine’s quality so the features that I choose for investigation is :
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality sourness
## 1 5 5.1800
## 2 5 5.4600
## 3 5 5.4784
## 4 6 8.0976
## 5 5 5.1800
## 6 5 5.1800
Yes, I create “sourness” from fixed.acidity and citric.acid that represent the sourness of wine.
you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?
ANS: The distribution for citric acid, volatile.acidity and free.sulfur.dioxide appears bimodal and I tidies the data by remove X feature that I am not interested and transform fixed.acidity and citric.acid to sourness for next investigation.
## [1] 1255 840 538 820 834 888 1213 921 90 300 1152 901 1410 425
## [15] 599 1367 661 416 564 950 994 85 1530 590 496 1092 1509 333
## [29] 982 1531 721 1551 344 984 847 1576 753 349 618 454 1159 442
## [43] 92 838 32 302 1313 1505 1125 365 1420 307 226 943 893 736
## [57] 1412 218 1466 1042 1157 1575 700 481 331 1204 183 1436 10 802
## [71] 709 735 730 817 1342 1447 194 40 848 1194 731 827 159 1584
## [85] 1515 592 829 1353 1054 312 151 685 1181 609 1035 535 992 515
## [99] 136 1563
Top correlation values for quality is : 1. alcohol : 0.476 2. volatile.acidity : -0.391 3. sulphates : 0.251 4. citric acid : 0.226
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4313210 -0.3482032
## sample estimates:
## cor
## -0.3905578
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1793415 0.2723711
## sample estimates:
## cor
## 0.2263725
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2049011 0.2967610
## sample estimates:
## cor
## 0.2513971
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
investigation. How did the feature(s) of interest vary with other features in the dataset? ANS: From the plots and correlation values sulphates, citric acid acidity, alcohol positively relate with quality but volatile acidity negatively relate with quality.
Alcohol sulphates and volatile acidity ’s plot show the different between 3 wine rating of wine very well but citric acid show the different between normal and good wine poorly.
(not the main feature(s) of interest)?
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = -26.489, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5856550 -0.5174902
## sample estimates:
## cor
## -0.5524957
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = 13.159, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2678558 0.3563278
## sample estimates:
## cor
## 0.31277
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = 4.4188, df = 1597, p-value = 1.059e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.06121189 0.15807276
## sample estimates:
## cor
## 0.1099032
ANS: I found that citric acid and volatile acidity very correlate.
citric acid and volatile acidity : -0.5524957
citric acid and sulphates acidity : 0.31277
citric acid and alcohol acidity : 0.1099032
ANS: For feature of interest alcohol percentage has highest corelation value. (0.476)
For every pair of features free.sulfur.dioxide and total.sulfur.dioxide has highest corelation value. (0.66
First I need to prepare alcohol.level for multivariate plot.
From the plot show that excellent wine mostly stay on the top left, good wine stay in the middle and normal wine stay in the bottom right.
Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that :
excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## [13] "sourness" "wine_rating" "alcohol.level"
Wine rating.vs.alcohol.level.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.4 is very high.
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## [13] "sourness" "wine_rating" "alcohol.level"
Wine rating.vs.alcohol.level.vs.total.sulfur.dioxide plot shows that excellent wine ratio in alcohol grade “medium” on total.sulfur.dioxide at 5-30 is very high.
Wine rating.vs.alcohol.level.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Pattern is not noticable here.
investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?
From the plots , show that alcohol feature is the highest impact feature.
Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
Wine rating.vs.alcohol.level.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.35-0.5 is very high.
Wine rating.vs.alcohol.level.vs.total.sulfur.dioxide plot shows that excellent wine ratio in alcohol grade “medium” on total.sulfur.dioxide at 5-30 is very high.
Wine rating.vs.alcohol.level.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Win rating.vs.alcohol.level.vs.sourness.vs.chlorides shows that there hardly to determine wine quality by tongue (chorides and sourness).
It is very surprise that smell(total.sulfur.dioxide) has influnce over the wine rating but taste(chorides and sourness) has not.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The distribution for alcohol appears right skewed with the peaking at 9.5 % and slope down to 13.0 % with verry few of 8.0 % to 9.0 %, may be the demand of wine are tend to be lower on higher percent alcohol and peak at 9.5 %.
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## [13] "sourness" "wine_rating" "alcohol.level"
## [1] 341 1459 740 310 746 485 16 57 963 571 582 630 686 79
## [15] 866 1433 1167 1022 1337 440 1250 851 801 1430 1126 781 1589 780
## [29] 1441 625 559 430 593 565 1590 1055 1056 1252 240 1113 752 226
## [43] 1583 1072 1573 615 246 1203 289 465 1321 687 779 365 111 177
## [57] 1461 671 1283 529 1189 743 1272 325 931 628 617 1201 526 1074
## [71] 1284 1175 962 34 668 72 109 1384 626 1247 5 1157 783 663
## [85] 1524 371 827 224 1460 698 1052 1436 1307 584 494 951 926 203
## [99] 1061 608
##
## Pearson's product-moment correlation
##
## data: redwine_data$quality and redwine_data$alcohol
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
##
## Pearson's product-moment correlation
##
## data: redwine_data$quality and redwine_data$volatile.acidity
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4313210 -0.3482032
## sample estimates:
## cor
## -0.3905578
##
## Pearson's product-moment correlation
##
## data: redwine_data$quality and redwine_data$sulphates
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2049011 0.2967610
## sample estimates:
## cor
## 0.2513971
##
## Pearson's product-moment correlation
##
## data: redwine_data$quality and redwine_data$citric.acid
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1793415 0.2723711
## sample estimates:
## cor
## 0.2263725
Alcohol percentage, Sulphates, Volatile acidity correlate with wine rating positively but Volatile acidity correlate negatively.
The alcohol variance in wine rating good and excellent are much larger than normal.
The citric acid variance in wine rating good and normal are much larger than excellent.
Top correlation values for quality is : 1. alcohol : 0.476 2. volatile.acidity : -0.391 3. sulphates : 0.251 4. citric acid : 0.226
Sulprisingly, volatile.acidity and citric acid is highly correlate negtively.
level?
## [1] "On any alcohol level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1800 0.4600 0.5900 0.5895 0.6800 1.5800
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.3800 0.4900 0.4975 0.6000 1.0400
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3000 0.3700 0.4055 0.4900 0.9150
## [1] "On low alcohol level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1900 0.4800 0.5900 0.5866 0.6700 1.2400
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4000 0.5200 0.5173 0.6200 1.0400
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2100 0.2775 0.3450 0.3633 0.4325 0.5800
## [1] "On medium-low alcohol level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1800 0.4300 0.5800 0.5956 0.6950 1.5800
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.3800 0.4950 0.4965 0.6000 1.0200
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2100 0.3100 0.3600 0.3963 0.4800 0.9150
## [1] "On medium alcohol level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2600 0.5000 0.5900 0.5858 0.6675 1.0400
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.3525 0.4400 0.4720 0.5800 1.0100
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3075 0.3800 0.4170 0.5100 0.8500
## [1] "On any alcohol level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0800 0.2200 0.2378 0.3600 1.0000
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2600 0.2738 0.4300 0.7800
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3000 0.4000 0.3765 0.4900 0.7600
## [1] "On low alcohol level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2200 0.2372 0.3400 1.0000
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1000 0.2400 0.2540 0.3825 0.7400
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0200 0.3250 0.4650 0.4325 0.5400 0.7200
## [1] "On medium-low alcohol level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0725 0.2350 0.2417 0.3900 0.7400
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0800 0.2650 0.2730 0.4325 0.7800
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3000 0.3900 0.3769 0.4900 0.7600
## [1] "On medium alcohol level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0650 0.1000 0.2104 0.3100 0.7900
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1200 0.3250 0.3033 0.4600 0.6900
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3150 0.4000 0.3704 0.4900 0.7600
## [1] "On any alcohol level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 23.75 45.00 54.65 78.00 155.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 23.00 35.00 40.87 54.00 165.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 17.00 27.00 34.89 43.00 289.00
## [1] "On low alcohol level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 30.00 53.00 61.39 88.00 153.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 25.00 38.00 45.30 63.25 136.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 23.25 29.00 28.92 36.50 45.00
## [1] "On medium-low alcohol level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 20.00 32.50 42.91 57.75 155.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 22.00 35.00 38.93 52.00 160.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 19.00 28.00 31.57 43.00 103.00
## [1] "On medium alcohol level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 15.25 25.50 40.35 61.75 113.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 18.00 33.00 39.04 47.75 165.00
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 14.75 25.00 38.05 45.75 289.00
## [1] "On any alcohol level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5200 0.5800 0.6185 0.6500 2.0000
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4000 0.5800 0.6400 0.6753 0.7500 1.9500
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3900 0.6500 0.7400 0.7435 0.8200 1.3600
## [1] "On low alcohol level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5200 0.5700 0.6216 0.6400 2.0000
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4300 0.5775 0.6250 0.6796 0.7200 1.9500
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5700 0.6500 0.7600 0.8033 0.8400 1.3600
## [1] "On medium-low alcohol level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3700 0.5400 0.5900 0.6158 0.6700 1.2000
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4200 0.6000 0.6500 0.6851 0.7600 1.3600
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4700 0.6700 0.7400 0.7517 0.8300 1.1000
## [1] "On medium alcohol level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3700 0.5200 0.5600 0.5888 0.6275 0.8400
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4000 0.5600 0.6200 0.6475 0.7200 1.0300
## --------------------------------------------------------
## redwin_data_at.alcoholitem$wine_rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3900 0.6400 0.7400 0.7309 0.8125 1.1300
On low alcohol percentage exellent wine quality rarely to be found mostly is normal rating wine. On medium-low alcohol percentage 75% of exellent wine can be found in total.sulfur.dioxide below 52 mg / dm^3 or volatile.acidity below 0.6 g/cm^3 but mostly are normal and good rating wine.
On mediumw alcohol percentage 75% of exellent wine can be found on : total.sulfur.dioxide below 45.75 mg / dm^3 or sulphate upper than 0.64 g/dm^3 and mostly are exellent and good rating wine.
## 'data.frame': 1599 obs. of 15 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## $ sourness : num 5.18 5.46 5.48 8.1 5.18 ...
## $ wine_rating : Ord.factor w/ 3 levels "normal"<"good"<..: 1 1 1 2 1 1 1 3 3 1 ...
## $ alcohol.level : Ord.factor w/ 3 levels "low alcohol"<..: 1 1 1 1 1 1 1 2 1 2 ...
The data set contain 1599 wine from 2009. I start by understand the variables in data set and try to interpret in term of sense that human can percieve. Surprisingly, I found that the taste sourness and salty has no evidence that they has influence over the quality of wine but the smell (total sulfur dioxide), addictive content (alcohol), voilatile acidity, citric acid and sulphates has influence over it. On low alcohol percentage we hardly found excellent wine_rating but mostly is normal and you can find some of good wine rating if they has total SO2 in range 5-60 and sulphates in range 0.53-0.73 ,On medium-low alcohol percentage wine exellent can be found on low volatile acidity and total sulfur dioxide below 55 but mostly are normal and good rating wine,On mediumw alcohol percentage wine exellent can be found at high percentage on total sulfur dioxide below 50 and sulphate upper than 0.65 and mostly are exellent and good rating wine. I struggled to visulize multivariate plot to clearly present the relation more than one featue against wine quality at first finally I found out that if I create new varible that represent the feature as group It will be easier,Next I can not clearly present the relation of selected features by geom_point as you can see the correlation value not so high for each feature but this become much better when I decide to use histogram.
Finally I am very enjoyed to make this analysis,It makes me more understand how to set the questions and solve them.